Mining Concept Drift from Data Streams by Unsupervised Learning
نویسندگان
چکیده
Mining is involved with knowing the unknown characteristics from the databases or gaining of Knowledge (Knowledge Discovery) from Databases to get more useful information from the database. Real time databases which are constantly changing with time, there may arise a point when traditional Data Mining techniques may not be adequate as there may be a previously unknown class label involved or new properties of data which need to be taken into consideration. Thus as time passes and new data is in the dataset, the model predicted by the data mining techniques may become less accurate. This phenomenon is known as Concept Drift. The meaning of Concept Drift is the statistical properties of the target variable, i. e. how the properties of the target variable change over the course of time. The basic idea behind the ?Mining Concept Drift from Data Stream by Unsupervised Learning? is to detect the Concept Drift present in the Data Stream, which is used in majority of Web-Based Applications like Fraud Detection & Span E-mail Filtering etc. The approach taken here is both for the Offline Approach & an Online Approach, which can be easily merged with the current Web-Based Applications. Some examples for Concept Drift are – In a fraud detection application the target concept may be a binary attribute FRAUDULENT with values "yes" or "no" that
منابع مشابه
Concept drift detection in business process logs using deep learning
Process mining provides a bridge between process modeling and analysis on the one hand and data mining on the other hand. Process mining aims at discovering, monitoring, and improving real processes by extracting knowledge from event logs. However, as most business processes change over time (e.g. the effects of new legislation, seasonal effects and etc.), traditional process mining techniques ...
متن کاملLearning from Data Streams with Concept Drift
Increasing access to incredibly large, nonstationary datasets and corresponding demands to analyse these data has led to the development of new online algorithms for performing machine learning on data streams. An important feature of real-world data streams is " concept drift, " whereby the distributions underlying the data can change arbitrarily over time. The presence of concept drift in a d...
متن کاملDetecting Concept Drift in Data Stream Using Semi-Supervised Classification
Data stream is a sequence of data generated from various information sources at a high speed and high volume. Classifying data streams faces the three challenges of unlimited length, online processing, and concept drift. In related research, to meet the challenge of unlimited stream length, commonly the stream is divided into fixed size windows or gradual forgetting is used. Concept drift refer...
متن کاملNew Ensemble Method for Classification of Data Streams
Classification of data streams has become an important area of data mining, as the number of applications facing these challenges increases. In this paper, we propose a new ensemble learning method for data stream classification in presence of concept drift. Our method is capable of detecting changes and adapting to new concepts which appears in the stream. Data stream classification; concept d...
متن کاملInfo-fuzzy algorithms for mining dynamic data streams
Most data mining algorithms assume static behavior of the incoming data. In the real world, the situation is different and most continuously collected data streams are generated by dynamic processes, which may change over time, in some cases even drastically. The change in the underlying concept, also known as concept drift, causes the data mining model generated from past examples to become le...
متن کامل